The objective of this project is to forecast demand for lettuce for four individual restaurants owned by a large fast-food retail chain in the US that makes submarine sandwiches (subs), in order to enable managers make inventory replenshipment decisions. Towards this goal, the ARIMA and Holt-Winters models are employed and evaluated in order to forecast demand in the two-week window from 06/16/2015 to 06/29/2015. The data in the 06/16/2015 to 06/29/2015 window are unseen and the performance of the models will be judged based on Mean Squared Deviation.
The data is contained in 11 different datasets, ranging from POS transactional data, to recipes, portion amounts and ingredients:
pos_ordersale: Includes franchisee point of sale transaction data, “MD5KEY_ORDERSALE” a unique identifier of the order in the POS, date and the store number
menuitem: Purchased menu item associated with a single transaction in pos_ordersale. Includes the quantity and price of item purchased, Id (unique identifier for associated menu item in menu_items),“MD5KEY_MENUITEM” a unique identifier for the purchased menu item, “MD5KEY_ORDERSALE” foreign key from pos_ordersale table
menu_item: Menu item associated with a franchisee recipe. Contains RecipeID (a unique identifier for the recipe and matches with recipes table), PLU (unique identifier for the recipe associated with this item) and MenuItemId
recipes: Table contains the recipe for a menu item
store_restaurant: Includes data on the store’s/restaurant’s location
recipes_ingredient_assignments: Includes the IngredientId for each recipe and the quantity specific ingredient used in that recipe
recipe_sub_recipe_assignments: Recipe for a specific sub
sub_recipes: Sub-recipe associated with a recipe
sub_recipe_ingr_assignments: Contains the ingredient within a sub-recipe
ingredients: An ingredient used in a franchisee recipe
portion_uom_types: Portion (or unit of measurement) for an ingredient
These datasets are read in:
ingredients<-read.csv(file="/Users/arma/Desktop/Data/ingredients.csv", header= TRUE, sep=",")
menu_items<-read.csv(file="/Users/arma/Desktop/Data/menu_items.csv", header= TRUE, sep=",")
menuitem<-read.csv(file="/Users/arma/Desktop/Data/menuitem.csv", header= TRUE, sep=",")
portion_uom_types<-read.csv(file="/Users/arma/Desktop/Data/portion_uom_types.csv", header= TRUE, sep=",")
pos_ordersale<-read.csv(file="/Users/arma/Desktop/Data/pos_ordersale.csv", header= TRUE, sep=",")
recipe_ingredient_assignments<-read.csv(file="/Users/arma/Desktop/Data/recipe_ingredient_assignments.csv", header= TRUE, sep=",")
recipe_sub_recipe_assignments<-read.csv(file="/Users/arma/Desktop/Data/recipe_sub_recipe_assignments.csv", header= TRUE, sep=",")
recipes<-read.csv(file="/Users/arma/Desktop/Data/recipes.csv", header= TRUE, sep=",")
sub_recipe_ingr_assignments<-read.csv(file="/Users/arma/Desktop/Data/sub_recipe_ingr_assignments.csv", header= TRUE, sep=",")
sub_recipes<-read.csv(file="/Users/arma/Desktop/Data/sub_recipes.csv", header= TRUE, sep=",")
library(readxl)
store_restaurant<-read_excel("/Users/arma/Desktop/Data/store_restaurant.xlsx", col_names = TRUE)
The aim of this section is to calculate the total quantity of lettuce demanded each day for each store, this is done as follows:
lettuce_recipes<-recipe_ingredient_assignments[recipe_ingredient_assignments$IngredientId==27,]
subs_with_lettuce<-sub_recipe_ingr_assignments[sub_recipe_ingr_assignments$IngredientId==27,] #Lettuce only
subs_lettuce_recipes<- inner_join(subs_with_lettuce,recipe_sub_recipe_assignments, by="SubRecipeId") #Inorder to get RecipeId and Factor
subs_lettuce_recipes<- subs_lettuce_recipes %>% mutate(lettuce_weight=Quantity*Factor) #Total amount of lettuce in each sub recipe is given by factor * quantity
aggregated_subs_lettuce<- subs_lettuce_recipes %>% group_by(RecipeId,IngredientId) %>% summarise(lettuce_used=sum(lettuce_weight)) #Sums up lettuce used for each RecipeId
colnames(aggregated_subs_lettuce)[3]<-"Quantity" #Rename so as to match the titles in lettuce_recipes
lettuce_recipes<- as.data.frame(lettuce_recipes)
aggregated_subs_lettuce<- as.data.frame(aggregated_subs_lettuce)
all_lettuce_recipes<-rbind(lettuce_recipes, aggregated_subs_lettuce)
total_lettuce_recipes<- all_lettuce_recipes %>% group_by(RecipeId) %>% summarise(lettuce_quantity=sum(Quantity))
menu<-inner_join(menuitem, menu_items, by = c("Id" = "MenuItemId")) #Inorder to get Quantity Purchased
lettuce_menu_item<-inner_join(menu, total_lettuce_recipes, by="RecipeId")
lettuce_menu_item$lettuce_demand<- lettuce_menu_item$Quantity * lettuce_menu_item$lettuce_quantity
lettuce_menu_item$date<-as.Date(lettuce_menu_item$date, format="%y-%m-%d") #Make date column a date object
cali1<- lettuce_menu_item[lettuce_menu_item$StoreNumber==46673,]
cali2<- lettuce_menu_item[lettuce_menu_item$StoreNumber==4904,]
ny1<- lettuce_menu_item[lettuce_menu_item$StoreNumber==12631,]
ny2<- lettuce_menu_item[lettuce_menu_item$StoreNumber==20974,]
#Aggregate lettuce demanded
cali1<- cali1 %>% arrange(date) %>% group_by(date) %>% summarise(aggregate_weight=sum(lettuce_demand))
cali2<- cali2 %>% arrange(date) %>% group_by(date) %>% summarise(aggregate_weight=sum(lettuce_demand))
ny1<- ny1 %>% arrange(date) %>% group_by(date) %>% summarise(aggregate_weight=sum(lettuce_demand))
ny2<- ny2 %>% arrange(date) %>% group_by(date) %>% summarise(aggregate_weight=sum(lettuce_demand))
Upon inspecting NY2, it can be seen that there are time gaps in the first 6 rows. The non consecutive days would make fitting a time series model problematic therefore, these six rows are removed.
ny2<-ny2[-c(1:6),]
This section begins with a qualitative assessment of the time series plots seen for each of the shops, by looking for evidence of seasonality, trend, cyclicity and stationarity.
p1<- cali1 %>% ggplot(aes(x=date,y=aggregate_weight)) + geom_line() + ggtitle("Cali1") +ylab("Lettuce Demand")
p2<- cali2 %>% ggplot(aes(x=date,y=aggregate_weight)) + geom_line() + ggtitle("Cali2") +ylab("Lettuce demand")
p3<- ny1 %>% ggplot(aes(x=date,y=aggregate_weight)) + geom_line() + ggtitle("NY1") +ylab("Lettuce demand")
p4<- ny2 %>% ggplot(aes(x=date,y=aggregate_weight)) + geom_line() + ggtitle("NY2") +ylab("Lettuce demand")
grid.arrange(p1, p2, p3, p4, nrow = 2)
1. Cali1 Store: Given that the data available spans March to June 2015, there is not enough data to determine if there is infact cyclicity and this will subsequently be ignored. Additionally, there appears to be little to no evidence of trend in the plot of Cali1 (no overall increase). However, within the months, there is evidence of weekly seasonality (frequency of 7 days). Finally, the mean and the variance appear to be roughly constant, implying trend stationarity.
2. Cali2 Store: Similar to the Cali1 Store above, Cali2 exhibits no observable trend and there appears to be weekly seasonality also. As above, mean and variance appear to be roughly constant, implying trend stationarity.
3. NY1: By contrast, the mean and variance for NY1 appear to not be constant and consequently appears non-stationary. Additionally, seasonality is also evident.
4. NY2: In NY2, mean and variance are roughly constant, indicating trend stationarity.
For fitting the ARIMA models, the Box-Jenkins procedure is followed:
Identification: Pre-processing of the data to make it stationary - stationarity is observed by the autocorrelations decaying to zero exponentially. Once the timeseries is stationary, an ARIMA model is fit using indicative parameters determined by ACF and PACF.
Estimation: The parameters are decided based on information criteria (AIC/BIC).
Verification: Model fit to the data is determined by performing residual analyses - visual inspection of the distribution of residuals as well as formal tests such as Ljung-Box.
Train/Test Split: At the start of this analysis, we split the data into a train and test sets. The Cali1 dataset has 103 observations and subsequent analyses are performed with an 80/20 train-test split. Consequently, the models are trained with data from the 5th of March 2015 to the 25th of May 2015 (1st observation to the 82nd). As noted above, given that the data are sampled daily, the time series will be formulated with a frequency of 7.
cali1_train<-cali1[1:82,] #Train on the first 82 observations
cali1_train<-ts(cali1_train[,2],frequency = 7, start=c(03,05)) #Make ts object
cali1_test<-cali1[83:dim(cali1)[1],] #Test on from 83rd to last
cali1_test<-ts(cali1_test[,2],frequency = 7, start=c(15,3)) #Make ts object
The plots of the timeseries, ACF and PACF are observed:
ggtsdisplay(cali1_train) #Look at timeseries, ACF, PACF plots for Cali1 Store
As seen from the ACF plot above, there are spikes occuring at lags that are a multiple of 7 (7, 14 and 21), reinforcing the earlier conclusion that seasonality occurs with a period of 7 days.
In section 3 above, it was assumed that the Cali1 timeseries was trend stationary. This is verified by utilising the ndiffs() function in order to determine the number of first-order differences required to stationarise the timeseries - as expected, this returns 0.
ndiffs(cali1_train) #Number of first order differences required to stationarise the time series.
## [1] 0
From observing the timeseries above, one observes that the magnitude of the seasonal fluctuations does not vary with the level of the time series. Therefore, it is assumed that the seasonal component is additive and in order to find the nsdiffs() function is used.
nsdiffs(cali1_train) #Number of seasonal differences required to stationarise the time series.
## [1] 1
As seen above one seasonal difference needs to be taken.
cali1_ts_diff<- diff(cali1_train, lag = 7) #One seasonal difference
This new differenced object is visually observed:
autoplot(cali1_ts_diff) + labs(y="Differenced Lettuce Demand for Cali 1")
As seen above, the plot appears stationary with the season component removed. Stationarity is formally tested by applying the Augemented Dickey-Fuller (ADF), Phillips-Perron (PP) and Kwiatkowski-Phillips-Schmidt-Shin (KPSS) tests. The ADF and PP test the null hypothesis that there is a unit root in the time series i.e. non-stationary. Conversely, the KPSS test the null hypotehsis that the timeseries is trend stationary.
adf.test(cali1_ts_diff) #ADF Test
## Warning in adf.test(cali1_ts_diff): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: cali1_ts_diff
## Dickey-Fuller = -4.1898, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
pp.test(cali1_ts_diff) #PP Test
## Warning in pp.test(cali1_ts_diff): p-value smaller than printed p-value
##
## Phillips-Perron Unit Root Test
##
## data: cali1_ts_diff
## Dickey-Fuller Z(alpha) = -70.036, Truncation lag parameter = 3, p-value
## = 0.01
## alternative hypothesis: stationary
kpss.test(cali1_ts_diff) #KPSS Test
## Warning in kpss.test(cali1_ts_diff): p-value greater than printed p-value
##
## KPSS Test for Level Stationarity
##
## data: cali1_ts_diff
## KPSS Level = 0.21189, Truncation lag parameter = 3, p-value = 0.1
As seen, the null hypothesis is rejected for the ADF and PP test, while the null cannot be rejected for the KPPS at a significance level of 5%. Thus, it is concluded that the timeseries is adequately stationarised.
With the now stationarised timeseries, the next step in the analysis is to determine the orders of the MA and AR components based on ACF and PACF. The PACF is expected to give an indication of the order of the AR component, while the ACF is expected to give that of the MA component. As no trend first order differences were required potential models will be of the form ARIMA(p,0,q)(P,1,Q)(7)
cali1_acf<-acf(cali1_ts_diff, lag.max = 21, plot= FALSE) #ACF of Differenced Data
plot(cali1_acf, main= "Cali 1 Store ACF")
cali1_pacf<-pacf(cali1_ts_diff,lag.max = 21, plot=FALSE) #PACF of Differenced Data
plot(cali1_pacf, main= "Cali 1 Store PACF")
In the ACF plot, it can be seen that the only significant lag is at 7. In the PACF plot, beyond the significant lag at 7, an exponential decay pattern can be seen in subsequent seasonal lags. Of the ~ 20 lags shown in the plots, with a confidence interval of 95%, one would expect 1 of the spikes in both plots to be erroneously significant. It is for this reason that the “slightly” significant lags (lags 11 in the ACF plot and 15 in the PACF plot) are ignored. Consequently, P and Q are expected to be less than or equal to 1 - providing support for a candidate model such as: ARIMA(0,0,0)(1,1,1)[7].
The auto.arima function is then used in order to select the optimal orders based on the information criteria.
auto.arima(cali1_train, trace = FALSE, approximation = FALSE, stepwise = FALSE, ic = c("aicc", "aic", "bic"))
## Series: cali1_train
## ARIMA(1,0,0)(0,1,1)[7]
##
## Coefficients:
## ar1 sma1
## 0.1793 -0.6995
## s.e. 0.1187 0.1229
##
## sigma^2 estimated as 699.8: log likelihood=-353.43
## AIC=712.85 AICc=713.19 BIC=719.81
The “best” model is given by ARIMA(1,0,0)(0,1,1)[7]. The seasonal portion is easily reconciled with the anlysis above - as there is a spike in the ACF at lag 7 and exponential decay patterns in the PACF seasonal lags. However, the surprise comes from the p=1 component as one would expect to see significant spikes in the early lags of the PACF in order to support this. Nonetheless, this model is selected as a candidate model to be tested further.
The above results, based on the information criteria, are summarised as follows:
Best Model: ARIMA(1,0,0)(0,1,1)[7]
Second Best Model: ARIMA(0,0,1)(0,1,1)[7]
Since the model expected from observing the ACF and PACF plots is amongst the top performers, it is also selected as a candidate model.
#Coding the three candidate models
cali.m1<-Arima(cali1_train, order = c(1,0,0), seasonal = list(order = c(0, 1, 1), period=7))
cali1.m2<-Arima(cali1_train, order = c(0,0,1), seasonal = list(order = c(0, 1, 1), period=7))
cali1.m3<-Arima(cali1_train, order = c(0,0,0), seasonal = list(order = c(1, 1, 1), period=7))
In Sample Performance
This section begins with the an evalution of the in-sample performance of the models above:
cali1_in_sample<-as.data.frame(accuracy(cali.m1))
cali1_in_sample<-rbind(cali1_in_sample,accuracy(cali1.m2))
cali1_in_sample<-rbind(cali1_in_sample, accuracy(cali1.m3))
cali1_in_sample$Model<- c("Model 1", "Model 2", "Model 3")
cali1_in_sample<- cali1_in_sample %>% select(Model, everything())
row.names(cali1_in_sample) <- NULL
cali1_in_sample %>% kable(caption = "In Sample Performance") %>% kable_styling("striped")
| Model | ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 |
|---|---|---|---|---|---|---|---|
| Model 1 | -0.7949027 | 24.95910 | 18.28022 | -2.980306 | 13.47389 | 0.6903407 | 0.0052265 |
| Model 2 | -0.8097159 | 24.98450 | 18.30272 | -2.989476 | 13.52350 | 0.6911902 | 0.0072108 |
| Model 3 | -1.2879451 | 25.54684 | 18.88651 | -3.500201 | 14.10580 | 0.7132369 | 0.1701267 |
From the results of the table above, it is seen that Model 1 perfoms the best on all metrics. This outcome is not surprising as this was the model selected by a range of information criteria, including BIC. These criteria, especially the BIC, select the models based on a balance between fit and parsimony - consequently, one would expect the best model selected by this procedure to exhibit the best performance.
Nonetheless, the out of sample performance of these three models will be tested in order to evaluate how well they generalise on unseen data.
Out of Sample Performance
When considering the evaluation of out-of-sample performance of ARIMA models, there are two approaches: one based on a one-step ahead forecast and another based on multi-step ahead forecast. In reality, as lettuce is considered fresh produce and there are likely to be many producers, one could argue that fast-food chains are unlikely to place orders for lettuce too far in advance - this is reflective of a supply chain that is likely to be quite responsive. However, as the output of this report is a forecast for the next 14 days from the end of the available data, the model performance on the multistep ahead forecast will be prioritised.
#Out of sample performance
cali1_out_sample<-as.data.frame(accuracy(forecast(cali.m1, h = 20), cali1_test))
cali1_out_sample<-rbind(cali1_out_sample,accuracy(forecast(cali1.m2, h = 20), cali1_test))
cali1_out_sample<-rbind(cali1_out_sample,accuracy(forecast(cali1.m3, h = 20), cali1_test))
cali1_out_sample$Model<- c("Model 1","Model 1", "Model 2", "Model 2","Model 3","Model 3")
cali1_out_sample<- cali1_out_sample %>% select(Model, everything())
cali1_out_sample %>% kable(caption = "Out of Sample Performance") %>% kable_styling("striped")
| Model | ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | Theil’s U | |
|---|---|---|---|---|---|---|---|---|---|
| Training set | Model 1 | -0.7949027 | 24.95910 | 18.28022 | -2.980306 | 13.47389 | 0.6903407 | 0.0052265 | NA |
| Test set | Model 1 | 7.6579755 | 32.69719 | 23.53935 | 1.828789 | 17.15170 | 0.8889481 | 0.1942585 | 0.4717754 |
| Training set1 | Model 2 | -0.8097159 | 24.98450 | 18.30272 | -2.989476 | 13.52350 | 0.6911902 | 0.0072108 | NA |
| Test set1 | Model 2 | 7.5579040 | 32.87697 | 23.56221 | 1.772784 | 17.17768 | 0.8898115 | 0.1947272 | 0.4746790 |
| Training set2 | Model 3 | -1.2879451 | 25.54684 | 18.88651 | -3.500201 | 14.10580 | 0.7132369 | 0.1701267 | NA |
| Test set2 | Model 3 | 11.1635471 | 36.15826 | 25.96754 | 4.386626 | 18.53260 | 0.9806472 | 0.1981617 | 0.5252240 |
#Plot Forecasts
cali1_m1_plot<-autoplot(forecast(cali.m1, h = 20))
cali1_m2_plot<-autoplot(forecast(cali1.m2, h = 20))
cali1_m3_plot<-autoplot(forecast(cali1.m3, h = 20))
grid.arrange(cali1_m1_plot, cali1_m2_plot, cali1_m3_plot, nrow = 2)
As expected, the perfomance of all the models is worse on the test set than on the training set. As seen, model 1 continues to perform the best across most of the evaluation Metrics. However, special attention is paid to the RMSE due to its sensitivity to large errors. For this reason, Model 1 is selected over the other models evaluated.
However, before Model 1 is formally selected and used to make predictions for the subsequent 14 days, its residuals are checked:
#Check Residuals
checkresiduals(cali.m1)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,0,0)(0,1,1)[7]
## Q* = 15.376, df = 12, p-value = 0.2215
##
## Model df: 2. Total lags used: 14
Ideally, there should be no autocorrelations in the residuals and this is evidenced by “white-noise” pattern seen in the time series, no significant spikes in the ACF plot and normally distributed residuals with mean 0. Further more, these are supported by the Ljung-Box test, with the null hypothesis that the model does not show a lack of fit.
The timeseries of the residuals above generally exhibits a white noise like pattern, although there appears to be a slight downwards trajectory in the later lags. There are two significant lags in the ACF plot. On account of random variations and their small value above the threshold, these are deemed acceptable. Finally, the Ljung-Box test returns a p-value of 0.2215, indicating that null hypothesis cannot be rejected. Model 1 is therefore considered satisfactory and will be compared against the Holt-Winters model in order to decide on which should be used for the final prediction.
Train/Test Split: As before, the dataset is split into a training and test set. There are 95 observations in the Cali2 dataset and subsequent analyses are performed with an 80/20 train-test split. The models are trained on data from the 13th of March 2015 to the 27th of May 2015 (1st observation to the 76th).
cali2_train<-cali2[1:76,] #Train on the first 76 observations
cali2_train<-ts(cali2_train[,2],frequency = 7, start=c(03,13)) #Make ts object
cali2_test<-cali2[77:dim(cali2)[1],] #Test on from 77th to last
cali2_test<-ts(cali2_test[,2],frequency = 7, start=c(15,5)) #Make ts object
The plots of the timeseries, ACF and PACF are observed:
ggtsdisplay(cali2_train) #Look at timeseries, ACF, PACF plots for Cali2 Store
As expected, in the ACF plot there are significant spikes occuring at lag 7 and multiples of 7 (14 and 21), confirming the seasonality.
The trend stationarity of the plot above is confirmed by using the ndiffs() function- as expected, this returns 0.
ndiffs(cali2_train) #Number of first order differences required to stationarise the time series.
## [1] 0
nsdiffs(cali2_train)
## [1] 1
As seen above, one seaonal difference needs to be taken.
cali2_diff<-diff(cali2_train,lag=7)
This new differenced object is visually observed:
autoplot(cali2_diff) + labs(y="Differenced Lettuce Demand for Cali 2")
adf.test(cali2_diff)
##
## Augmented Dickey-Fuller Test
##
## data: cali2_diff
## Dickey-Fuller = -2.8934, Lag order = 4, p-value = 0.2121
## alternative hypothesis: stationary
pp.test(cali2_diff)
## Warning in pp.test(cali2_diff): p-value smaller than printed p-value
##
## Phillips-Perron Unit Root Test
##
## data: cali2_diff
## Dickey-Fuller Z(alpha) = -68.18, Truncation lag parameter = 3, p-value
## = 0.01
## alternative hypothesis: stationary
kpss.test(cali2_diff)
##
## KPSS Test for Level Stationarity
##
## data: cali2_diff
## KPSS Level = 0.50699, Truncation lag parameter = 3, p-value = 0.04009
The conclusion derived from the ADF and KPSS test is that the time series is still not stationary. However, the PP test suggests that the time series is in fact stationay, as the null hypothesis is rejected with a p-value of 0.01. Upon observing the differenced time series plot above, it is seen that there appears to be a slight downwards trend, supporting the case for a trend difference to be taken.
#Take a trend difference on the already seasonally differenced cali2_diff. How many differences to take?
ndiffs(cali2_diff)
## [1] 1
#Take a trend difference on the already seasonally differenced cali2_diff
cali2_diff_2<-diff(cali2_diff, differences=1)
The ADF, PP and KPSS tests are then performed again to ensure stationarity.
#ADF Test
adf.test(cali2_diff_2)
## Warning in adf.test(cali2_diff_2): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: cali2_diff_2
## Dickey-Fuller = -5.1688, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
#PP Test
pp.test(cali2_diff_2)
## Warning in pp.test(cali2_diff_2): p-value smaller than printed p-value
##
## Phillips-Perron Unit Root Test
##
## data: cali2_diff_2
## Dickey-Fuller Z(alpha) = -93.446, Truncation lag parameter = 3, p-value
## = 0.01
## alternative hypothesis: stationary
#PP Test
kpss.test(cali2_diff_2)
## Warning in kpss.test(cali2_diff_2): p-value greater than printed p-value
##
## KPSS Test for Level Stationarity
##
## data: cali2_diff_2
## KPSS Level = 0.075925, Truncation lag parameter = 3, p-value = 0.1
As the stationarity outcome is consistent from all three tests in the case of the doubly-differenced data (cali2_diff_2), further analysis on ACF/PACF is performed.
#ACF
cali2_acf<-acf(cali2_diff_2, lag.max=21, plot=FALSE)
plot(cali2_acf, main= "Cali 2 Store ACF")
#PACF
cali2_pacf<-pacf(cali2_diff_2, lag.max=21, plot=FALSE)
plot(cali2_pacf, main= "Cali 2 Store PACF")
From the ACF plot, there are significant spikes at lag 1 and lag 7. This suggests a non-seasonal MA(1) component, as well as a seasonal MA(1) component. Similarly, the significant spike at lag 1 of the PACF is suggestive of an AR(1). None of the spikes at the seasonal lags in the PACF plot is significant.
The auto.arima function is run in order to return the optimal model order, based on the information criteria.
auto.arima(cali2_train, trace = FALSE, approximation = FALSE, stepwise = FALSE, ic = c("aicc", "aic", "bic"))
## Series: cali2_train
## ARIMA(0,1,1)(0,1,1)[7]
##
## Coefficients:
## ma1 sma1
## -0.8413 -0.5869
## s.e. 0.0720 0.1838
##
## sigma^2 estimated as 2248: log likelihood=-360.16
## AIC=726.31 AICc=726.69 BIC=732.97
The best model is given by ARIMA(0,1,1)(0,1,1)[7] and the second best model is given by ARIMA(0,1,1)(1,1,1)[7]. The outcome with the analysis above - given the significant spike in the MA plot at lag 1, an MA(1) component is not surprising. Additionally, the seasonal MA(1) component is explained by the significant spike at lag 7 in the ACF.
The “best” model and the second best model (given by ARIMA(0,1,1)(1,1,1)[7]) are evaluated based on their in-sample and out of sample performance.
#Coding the two candidate models
cali2.m1<-Arima(cali2_train, order = c(0,1,1), seasonal = list(order = c(0, 1, 1), period=7))
cali2.m2<-Arima(cali1_train, order = c(0,1,1), seasonal = list(order = c(1, 1, 1), period=7))
In Sample Performance:
cali2_in_sample<-as.data.frame(accuracy(cali2.m1))
cali2_in_sample<-rbind(cali2_in_sample,accuracy(cali2.m2))
cali2_in_sample$Model<- c("Model 1", "Model 2")
cali2_in_sample<- cali2_in_sample %>% select(Model, everything())
row.names(cali2_in_sample) <- NULL
cali2_in_sample %>% kable(caption = "In Sample Performance") %>% kable_styling("striped")
| Model | ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 |
|---|---|---|---|---|---|---|---|
| Model 1 | -7.638800 | 44.18003 | 31.90097 | -3.637262 | 10.72865 | 0.7443920 | -0.1074127 |
| Model 2 | -2.973974 | 24.45665 | 18.77466 | -4.563226 | 14.41179 | 0.7090127 | 0.0736023 |
The results are a mixed bag, however, Model 2 outperforms Model 1 on RMSE. The out of sample performance of these two models is also assesed.
Out of Sample Performance:
#Out of sample performance
cali2_out_sample<-as.data.frame(accuracy(forecast(cali2.m1, h = 19), cali2_test))
cali2_out_sample<-rbind(cali2_out_sample,accuracy(forecast(cali2.m2, h = 19), cali2_test))
cali2_out_sample$Model<- c("Model 1","Model 1", "Model 2", "Model 2")
cali2_out_sample<- cali2_out_sample %>% select(Model, everything())
cali2_out_sample %>% kable(caption = "Out of Sample Performance") %>% kable_styling("striped")
| Model | ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | Theil’s U | |
|---|---|---|---|---|---|---|---|---|---|
| Training set | Model 1 | -7.638800 | 44.18003 | 31.90097 | -3.637262 | 10.72865 | 0.7443920 | -0.1074127 | NA |
| Test set | Model 1 | 71.572809 | 94.50028 | 74.24301 | 21.922535 | 23.10160 | 1.7324206 | -0.3237485 | 1.059464 |
| Training set1 | Model 2 | -2.973974 | 24.45665 | 18.77466 | -4.563226 | 14.41179 | 0.7090127 | 0.0736023 | NA |
| Test set1 | Model 2 | 194.415590 | 200.60710 | 194.41559 | 65.133028 | 65.13303 | 7.3419785 | 0.0662369 | 2.019972 |
#Plot Forecasts
cali2_m1_plot<-autoplot(forecast(cali2.m1, h = 19))
cali2_m2_plot<-autoplot(forecast(cali2.m2, h = 19))
grid.arrange(cali2_m1_plot, cali2_m2_plot, nrow = 1)
As Model 1 outperforms Model 2 on the out of sample test, it is selected. Finally, its residuals are checked.
checkresiduals(cali2.m1)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,1)(0,1,1)[7]
## Q* = 16.008, df = 12, p-value = 0.1909
##
## Model df: 2. Total lags used: 14
Model 1 provides a satisfactory fit for the data: the time series plot of the residuals displays white noise patterns and the residuals are normally distributed with mean zero. The spike at lag 8 crosses the threshold. With about 20 lags, This can be attributed to random errors as ~1 of the spikes is expected to be erroneously classified. Finally, this is supported by the large p value in the Ljung-Box test of about 0.2.
Train/Test Split: The 103 observations are split into a training/test set using the 80-20 split.
ny1_train<-ny1[1:82,] #Train on first 82 observations
ny1_train<-ts(ny1_train[,2],frequency = 7, start=c(03,05)) #Make ts object
ny1_test<-ny1[83:dim(ny1)[1],] #Test on from 83rd to last
ny1_test<-ts(ny1_test[,2],frequency = 7, start=c(15,3)) #Make ts object
The plots of the timeseries, ACF and PACF are observed:
ggtsdisplay(ny1_train) #Look at timeseries, ACF, PACF plots for NY1 Store
As mentioned earlier on, the mean and variance of the time series appear not to be constant (actually increasing over time) and consequently, indicative of multiplicative seasonal component. Consequently this appaears to be a candidate for the Box-Cox/logarithmic transformation.
ny1_train_transformed<-log(ny1_train)
autoplot(ny1_train_transformed) + labs(y="Log(Lettuce Demand for NY 1)")
The magnitude of the variations appear to be more stable after being transformed.
#How many differences need to be taken in order to stationarise
ndiffs(ny1_train_transformed)
## [1] 1
#How many seasonal differences need to be taken in order to stationarise
nsdiffs(ny1_train_transformed)
## [1] 0
Consequently, the timeseries is differenced as follows:
#How many differences need to be taken in order to stationarise
ny1_train_transformed_diff<-diff(ny1_train_transformed, differences = 1)
The stationarity tests are perfomed to ensure it is infact stationary.
adf.test(ny1_train_transformed_diff)
## Warning in adf.test(ny1_train_transformed_diff): p-value smaller than printed p-
## value
##
## Augmented Dickey-Fuller Test
##
## data: ny1_train_transformed_diff
## Dickey-Fuller = -7.2516, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
pp.test(ny1_train_transformed_diff)
## Warning in pp.test(ny1_train_transformed_diff): p-value smaller than printed p-
## value
##
## Phillips-Perron Unit Root Test
##
## data: ny1_train_transformed_diff
## Dickey-Fuller Z(alpha) = -104.45, Truncation lag parameter = 3, p-value
## = 0.01
## alternative hypothesis: stationary
kpss.test(ny1_train_transformed_diff)
## Warning in kpss.test(ny1_train_transformed_diff): p-value greater than printed
## p-value
##
## KPSS Test for Level Stationarity
##
## data: ny1_train_transformed_diff
## KPSS Level = 0.023637, Truncation lag parameter = 3, p-value = 0.1
All three tests indicate that the timeseries has been stationarized.
Possible models will be of the form ARIMA(p,1,q)(P,0,Q)[7]. In identify possible orders, the ACF and PACF plots are analysed.
#ACF
ny1_acf<-acf(ny1_train_transformed_diff, lag.max=21, plot=FALSE)
plot(ny1_acf, main= "NY 1 Store ACF")
#PACF
ny1_pacf<-pacf(ny1_train_transformed_diff, lag.max=21, plot=FALSE)
plot(ny1_pacf, main= "NY 1 Store PACF")
From the observation of the ACF plot, the significant spike at lag 1 and lag 7 are indicative of orders q and Q of <=1. Conversely, from the observation of the PACF plot, the lags appear to stop being significant after lag 6. Consequently, orders of p <=5 are expected.
The auto.arima function is then used in order to select the optimal orders based on the information criteria.
auto.arima(ny1_train, lambda = 0, trace = FALSE, approximation = FALSE, stepwise = FALSE, ic = c("aicc", "aic", "bic"))
## Series: ny1_train
## ARIMA(0,1,1)(2,0,0)[7]
## Box Cox transformation: lambda= 0
##
## Coefficients:
## ma1 sar1 sar2
## -0.9190 0.2788 0.2046
## s.e. 0.0445 0.1127 0.1201
##
## sigma^2 estimated as 0.02328: log likelihood=37.44
## AIC=-66.88 AICc=-66.35 BIC=-57.3
The best model is given by ARIMA(0,1,1)(2,0,0)[7] and the second best model is given by ARIMA(0,1,1)(1,0,0)[7]. These two candidate models will be evaluated, in order to judge their out of sample performance.
Bias Adjustment: An issue that arises with the Box-Cox transformation is that the back transformed point forecast is usually the median as opposed to the mean. Considering that the fast food chain would most likely like to add up all the forecasted demand for each of the restaurants in order to plan lettuce delivery/inventory, the mean is preferred. This has been achieved by setting the biasadj argument in the Arima function to TRUE.
ny1.m1<-Arima(ny1_train, order = c(0,1,1), seasonal = list(order = c(2, 0, 0), period=7), lambda = 0, include.mean = FALSE, biasadj = TRUE)
ny1.m2<-Arima(ny1_train, order = c(0,1,1), seasonal = list(order = c(1, 0, 0), period=7), lambda = 0, include.mean = FALSE, biasadj = TRUE)
In Sample Performance
ny1_in_sample<-as.data.frame(accuracy(ny1.m1))
ny1_in_sample<-rbind(ny1_in_sample,accuracy(ny1.m2))
ny1_in_sample$Model<- c("Model 1", "Model 2")
ny1_in_sample<- ny1_in_sample %>% select(Model, everything())
row.names(ny1_in_sample) <- NULL
ny1_in_sample %>% kable(caption = "In Sample Performance") %>% kable_styling("striped")
| Model | ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 |
|---|---|---|---|---|---|---|---|
| Model 1 | 2.514712 | 39.04974 | 29.64812 | -1.0358261 | 11.50951 | 0.8337491 | -0.0233680 |
| Model 2 | 3.358640 | 39.82973 | 30.26919 | -0.8005959 | 11.68300 | 0.8512146 | -0.0043247 |
Model 1 performs the best on most of the metrics, especially on the RMSE. The models are subsequently evaluated on their out of sample performance.
Out of Sample Performance:
#Out of sample performance
ny1_out_sample<-as.data.frame(accuracy(forecast(ny1.m1, h = 20), ny1_test))
ny1_out_sample<-rbind(ny1_out_sample,accuracy(forecast(ny1.m2, h = 20), ny1_test))
ny1_out_sample$Model<- c("Model 1","Model 1", "Model 2", "Model 2")
ny1_out_sample<- ny1_out_sample %>% select(Model, everything())
ny1_out_sample %>% kable(caption = "Out of Sample Performance") %>% kable_styling("striped")
| Model | ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | Theil’s U | |
|---|---|---|---|---|---|---|---|---|---|
| Training set | Model 1 | 2.514712 | 39.04974 | 29.64812 | -1.0358261 | 11.50951 | 0.8337491 | -0.0233680 | NA |
| Test set | Model 1 | 11.993375 | 48.73665 | 39.11604 | 1.5721260 | 13.42434 | 1.1000011 | 0.3046582 | 0.7664256 |
| Training set1 | Model 2 | 3.358640 | 39.82973 | 30.26919 | -0.8005959 | 11.68300 | 0.8512146 | -0.0043247 | NA |
| Test set1 | Model 2 | 10.998585 | 51.56242 | 41.72472 | 0.9421743 | 14.45553 | 1.1733611 | 0.3068557 | 0.8275730 |
#Plot Forecasts
ny1_m1_plot<-autoplot(forecast(ny1.m1, h = 20))
ny1_m2_plot<-autoplot(forecast(ny1.m2, h = 20))
grid.arrange(ny1_m1_plot, ny1_m2_plot, nrow = 1)
As expected, both models perform worse on the test set. Nonetheless, Model 1 out performs Model 2 on most performance metrics, especially the RMSE and is therefore selected.
Finally, the residuals from Model 1 are evaluated:
checkresiduals(ny1.m1)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,1)(2,0,0)[7]
## Q* = 7.3478, df = 11, p-value = 0.7703
##
## Model df: 3. Total lags used: 14
By all indications, model 1 provides a satisfactory fit for the data: the time series plot of the residuals displays white noise patterns, none of the spikes in the ACF plot are significant and the residuals are normally distributed with mean zero. Finally, this is supported by the large p value in the Ljung-Box test.
Train/Test Split: There are 88 observation in the NY2 dataset and they will be split 80/20 into the training-test set.
ny2_train<-ny2[1:70,] #Train on first 70 observations
ny2_train<-ts(ny2_train[,2],frequency = 7, start=c(03,20)) #Make ts object
ny2_test<-ny2[71:dim(ny2)[1],] #Test on from 71st to last
ny2_test<-ts(ny2_test[,2],frequency = 7, start=c(15,6)) #Make ts object
The plot of the timeseries, ACF and PACF are observed:
ggtsdisplay(ny2_train) #Look at timeseries, ACF, PACF plots for NY2 Store
The timeseries above appears to be trend stationary. However, this is confirmed by using the ndiffs function to see if any differences need to be taken.
#How many differences need to be taken in order to stationarise
ndiffs(ny2_train)
## [1] 0
No differences need to be taken.
#How many seasonal differences need to be taken in order to stationarise
nsdiffs(ny2_train)
## [1] 0
Consequently, the time series appears to stationary in terms of trend and seasonality, therefore models are expected to be of the sum ARIMA(p,0,q)(P,0,Q)[7]. The stationarity is confirmed by performing ADF, PP and KPSS tests.
adf.test(ny2_train) #ADF Test
##
## Augmented Dickey-Fuller Test
##
## data: ny2_train
## Dickey-Fuller = -3.4602, Lag order = 4, p-value = 0.05333
## alternative hypothesis: stationary
pp.test(ny2_train) #PP Test
## Warning in pp.test(ny2_train): p-value smaller than printed p-value
##
## Phillips-Perron Unit Root Test
##
## data: ny2_train
## Dickey-Fuller Z(alpha) = -47.388, Truncation lag parameter = 3, p-value
## = 0.01
## alternative hypothesis: stationary
kpss.test(ny2_train) #KPSS Test
## Warning in kpss.test(ny2_train): p-value greater than printed p-value
##
## KPSS Test for Level Stationarity
##
## data: ny2_train
## KPSS Level = 0.25303, Truncation lag parameter = 3, p-value = 0.1
The KPSS and PP test confirm the stationarity. Given that the the p-value of the ADF is only slightly above the 5% threshold, the above model is considered to be stationary.
#ACF
ny2_acf<-acf(ny2_train, lag.max=21, plot=FALSE)
plot(ny1_acf, main= "NY 2 Store ACF")
#PACF
ny2_pacf<-pacf(ny2_train, lag.max=21, plot=FALSE)
plot(ny2_pacf, main= "NY 2 Store PACF")
Given the significant spikes in the ACF at the first lag and lag 7, are suggestive of q and Q<=1. In the PACF plot, the first spike at lag 1 is significant while that at lag 7 appears to just be at the boundary.
The auto.arima function is then used in order to select the optimal orders based on the information criteria.
auto.arima(ny2_train, trace = FALSE, approximation = FALSE, stepwise = FALSE, ic = c("aicc", "aic", "bic"))
## Series: ny2_train
## ARIMA(1,0,0)(1,0,0)[7] with non-zero mean
##
## Coefficients:
## ar1 sar1 mean
## 0.2595 0.3230 218.4136
## s.e. 0.1167 0.1195 11.0479
##
## sigma^2 estimated as 2443: log likelihood=-371.24
## AIC=750.48 AICc=751.1 BIC=759.48
The best model is given by ARIMA(1,0,0)(1,0,0)[7] and the second best is given by ARIMA(0,0,1)(1,0,0)[7] with non-zero mean. The AR(1) component in the “best model” is explained by significant spike in the PACF at lag 1, while the seasonal-AR(1) is explained by the spike at lag 7 in the PACF that reaches the significance boundary.
#Coding the three candidate models
ny2.m1<-Arima(ny2_train, order = c(1,0,0), seasonal = list(order = c(1, 0, 0), period=7))
ny2.m2<-Arima(ny2_train, order = c(0,0,1), seasonal = list(order = c(1, 0, 0), period=7))
ny2_in_sample<-as.data.frame(accuracy(ny2.m1))
ny2_in_sample<-rbind(ny2_in_sample,accuracy(ny2.m2))
ny2_in_sample$Model<- c("Model 1", "Model 2")
ny2_in_sample<- ny2_in_sample %>% select(Model, everything())
row.names(ny2_in_sample) <- NULL
ny2_in_sample %>% kable(caption = "In Sample Performance") %>% kable_styling("striped")
| Model | ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 |
|---|---|---|---|---|---|---|---|
| Model 1 | 1.304185 | 48.35115 | 38.73376 | -18.79179 | 33.82831 | 0.8172227 | -0.0171058 |
| Model 2 | 1.312231 | 48.61274 | 38.78247 | -18.94127 | 33.96957 | 0.8182503 | 0.0247216 |
As seen, Model 1 out performs Model 2 and the out of sample performance of both models is tested.
#Out of sample performance
ny2_out_sample<-as.data.frame(accuracy(forecast(ny2.m1, h = 18), ny2_test))
ny2_out_sample<-rbind(ny2_out_sample,accuracy(forecast(ny2.m2, h = 18), ny2_test))
ny2_out_sample$Model<- c("Model 1","Model 1", "Model 2", "Model 2")
ny2_out_sample<- ny2_out_sample %>% select(Model, everything())
ny2_out_sample %>% kable(caption = "Out of Sample Performance") %>% kable_styling("striped")
| Model | ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | Theil’s U | |
|---|---|---|---|---|---|---|---|---|---|
| Training set | Model 1 | 1.304185 | 48.35115 | 38.73376 | -18.791794 | 33.82831 | 0.8172227 | -0.0171058 | NA |
| Test set | Model 1 | -2.033126 | 49.24959 | 32.28410 | -5.678179 | 15.07788 | 0.6811449 | -0.0325667 | 0.7378221 |
| Training set1 | Model 2 | 1.312231 | 48.61274 | 38.78247 | -18.941275 | 33.96957 | 0.8182503 | 0.0247216 | NA |
| Test set1 | Model 2 | -1.826844 | 49.12370 | 32.22283 | -5.550662 | 15.02731 | 0.6798520 | -0.0368099 | 0.7350267 |
The performance on Model 1 and Model 2 are quite close on the out-of-sample test. However, Model 2 is selected over Model 1 due to its slightly better performance on the RMSE.
#Plot Forecasts
ny2_m1_plot<-autoplot(forecast(ny2.m1, h = 18))
ny2_m2_plot<-autoplot(forecast(ny2.m2, h = 18))
grid.arrange(ny2_m1_plot, ny2_m2_plot, nrow = 1)
Finally, the residuals from Model 2 are evaluated:
checkresiduals(ny2.m2)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,1)(1,0,0)[7] with non-zero mean
## Q* = 8.0122, df = 11, p-value = 0.7122
##
## Model df: 3. Total lags used: 14
Model 2 provides a satisfactory fit for the data (in spite of the long tail in the histogram): the time series plot of the residuals displays white noise patterns, none of the spikes in the ACF plot are significant and the residuals are normally distributed with mean zero. Finally, this is supported by the large p value in the Ljung-Box test.
In this portion of the analysis, the time series is modelled using the ets() function as opposed to the HoltWinters() function, due to the in-built optimisation of initial states and smoothing parameters via the maximisation of the likelihood function. Three steps are followed:
Visually inspect the timeseries for any indication of trend/seasonality and if they are additive or multiplicative.
Run the ets() function with the “model” argument set to “ZZZ” to determine the best model. The best model from this process is compared with the expected model from (1).
Model evaluation on in and out-of-sample performance.
The timeseries is decomposed into its component parts - level, trend, seasonal and errors. This is performed using Seasonal and Trend decomposition using Loess (STL). One particular advantage of using STL is the fact that it can handle a range of seasonalities - not just monthly/quarterly. Finally, the same train/test split defined above in the ARIMA section is used here.
cali1_stl<-cali1_train[,1]
stl(cali1_stl,s.window ="period") %>% autoplot
In STL plots, the grey bars indicate the relative importance of each component. It can be seen that the seasonal component plays the most important part however, the large grey bar for the trend component indicates that its contribution to the original data/time series is marginal. Consequently, the ETS model expected is given by “A,N,A”, where the error type and season type are expected to be additive.
cali1_ets<-ets(cali1_train, model="ZZZ") #Evaluate the best model
cali1_ets
## ETS(A,N,A)
##
## Call:
## ets(y = cali1_train, model = "ZZZ")
##
## Smoothing parameters:
## alpha = 0.1223
## gamma = 1e-04
##
## Initial states:
## l = 144.2834
## s = 31.9515 15.5058 26.5879 24.4611 -70.2816 -46.5225
## 18.2978
##
## sigma: 24.6121
##
## AIC AICc BIC
## 897.1487 900.2473 921.2159
As seen above, the “best model” is consistent with the expected “A,N,A”.
In/Out-Of-Sample Performance:
cali1_ets_out_sample<-as.data.frame(accuracy(forecast(cali1_ets, h = 20), cali1_test))
cali1_ets_out_sample %>% kable(caption = "In Sample Performance") %>% kable_styling("striped")
| ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | Theil’s U | |
|---|---|---|---|---|---|---|---|---|
| Training set | -1.861661 | 23.22218 | 18.45667 | -4.154372 | 14.30555 | 0.6970043 | 0.0562287 | NA |
| Test set | 19.416481 | 33.28287 | 25.92539 | 12.463420 | 18.22348 | 0.9790556 | 0.2157657 | 0.4540012 |
As expected, the model performance on the test set is worse than on the training set. In the next section, the performance of the Holt-Winters’ model is compared to the ARIMA.
#Plot Residuals
autoplot(forecast(cali1_ets, h = 20))
Check Residuals:
checkresiduals(cali1_ets)
##
## Ljung-Box test
##
## data: Residuals from ETS(A,N,A)
## Q* = 13.685, df = 5, p-value = 0.01774
##
## Model df: 9. Total lags used: 14
From the results from the residual plots above appear to be inconclusive. While the time series does exhibit white noise-like behaviour, there is a slight downwards trajectory.The spikes are insignificant (apart from the spike at lag 11, attributable to tandom errors) and the distribution of the errors is normal with mean zero. However, the p-value of the Ljung-Box test is small enough for the null hypothesis to be rejected- an indication that the model shows a lack of fit.
The inconclusive nature of the residual analyses means that one should “approach with caution” with forecasts from this model.
The timeseries is decomposed into its component parts with STL, in order to inspect the relative importance of the trend, seasonal and remainder components.
cali2_stl<-cali2_train[,1]
stl(cali2_stl,s.window ="period") %>% autoplot
From observing the relative size of the grey bars, it is seen that the trend component does not contribute much to the overall time series and so can be ignored. Consequently, the expected model is the ETS(“A,N,A”), with additive errors and seasonality. However, due to the the relative large size of the bars of the seasonal plot, the ETS(“A,N,N”) model is also explored for its out of sample performance.
cali2_ets_1<-ets(cali2_train, model="ZZZ") #Evaluate the best model
cali2_ets_1
## ETS(A,A,A)
##
## Call:
## ets(y = cali2_train, model = "ZZZ")
##
## Smoothing parameters:
## alpha = 0.0127
## beta = 0.0125
## gamma = 5e-04
##
## Initial states:
## l = 315.9648
## b = 3.6247
## s = 8.1102 50.6401 57.8214 40.4994 18.3055 -80.5061
## -94.8704
##
## sigma: 42.8103
##
## AIC AICc BIC
## 912.2838 917.2362 940.2526
The best model according to the optimisation is given by ETS(A,A,A). However, it can be seen the smoothing parameter for the trend portion is 0.0125 and close to zero. Therefore, the ETS(“A,N,A”) and ETS(“A,N,N”) models are also explored in addition to this.
cali2_ets_2<-ets(cali2_train, model="ANA") #ANA Model
cali2_ets_3<-ets(cali2_train, model="ANN") #ANN Model
cali2_ets_out_sample<-as.data.frame(accuracy(forecast(cali2_ets_1, h = 19), cali2_test))
cali2_ets_out_sample<-rbind(cali2_ets_out_sample,accuracy(forecast(cali2_ets_2, h = 19), cali2_test))
cali2_ets_out_sample<-rbind(cali2_ets_out_sample,accuracy(forecast(cali2_ets_3, h = 19), cali2_test))
cali2_ets_out_sample$Model<- c("Model 1","Model 1", "Model 2", "Model 2", "Model 3", "Model 3")
cali2_ets_out_sample<- cali2_ets_out_sample %>% select(Model, everything())
cali2_ets_out_sample %>% kable(caption = "Out of Sample Performance") %>% kable_styling("striped")
| Model | ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | Theil’s U | |
|---|---|---|---|---|---|---|---|---|---|
| Training set | Model 1 | -8.8667541 | 39.59116 | 30.57049 | -4.868276 | 10.79776 | 0.7133458 | -0.0497177 | NA |
| Test set | Model 1 | 89.7060900 | 101.04021 | 89.98506 | 29.590915 | 29.69424 | 2.0997529 | -0.0804477 | 1.0880721 |
| Training set1 | Model 2 | -3.9605391 | 41.26204 | 31.63586 | -3.438702 | 10.98139 | 0.7382057 | -0.0893617 | NA |
| Test set1 | Model 2 | 34.0494478 | 55.95950 | 44.43286 | 10.551570 | 14.25775 | 1.0368169 | -0.0241791 | 0.6328567 |
| Training set2 | Model 3 | -2.6510006 | 76.18909 | 62.80163 | -7.544948 | 22.51971 | 1.4654421 | 0.3718191 | NA |
| Test set2 | Model 3 | -0.1610933 | 65.95330 | 56.86687 | -4.928026 | 19.78695 | 1.3269577 | 0.1353543 | 0.7255181 |
While Model 1 performs the best on the training sample, the out of sample/test RMSE is actually lowest for Model 2. This indicates that Model 1 is likely to have been overfitting the test data and for this reason, Model 2 is selected (given by ANA).
#Plot Forecast
autoplot(forecast(cali2_ets_2, h = 19))
checkresiduals(cali2_ets_2)
##
## Ljung-Box test
##
## data: Residuals from ETS(A,N,A)
## Q* = 12.858, df = 5, p-value = 0.02475
##
## Model df: 9. Total lags used: 14
Although the p-value of the Ljung-Box test indicates that the null hypothesis can be reject, all visual inspections of the timeseries, spikes and normal distribution point to suitability of the model.
STL is used in order to decompose the timeseries.
ny1_stl<-ny1_train[,1]
stl(ny1_stl,s.window ="period") %>% autoplot
It is can be seen that due to growing variance in the data and growing error terms that a multiplicative model is more appropiate. Consequently, potential models to test include “M,A,M”, “M,N,M” and “M,A,N” as the grey bars for the trend and seasonal components are relatively large.
ny1_ets_1<-ets(ny1_train, model="ZZZ")
ny1_ets_1
## ETS(M,N,M)
##
## Call:
## ets(y = ny1_train, model = "ZZZ")
##
## Smoothing parameters:
## alpha = 0.1549
## gamma = 1e-04
##
## Initial states:
## l = 240.4624
## s = 1.0391 0.9567 0.9337 1.0025 0.8566 1.1237
## 1.0877
##
## sigma: 0.1438
##
## AIC AICc BIC
## 962.0482 965.1467 986.1153
The model from the optimisation is given by the “M,N,M” model. Consequently, the “M,A,M” and “M,A,N” are taken as the other two candidate models to be tested for their out of sample performance.
ny1_ets_2<-ets(ny1_train, model="MAM")
ny1_ets_3<-ets(ny1_train, model="MAN")
ny1_ets_2
## ETS(M,Ad,M)
##
## Call:
## ets(y = ny1_train, model = "MAM")
##
## Smoothing parameters:
## alpha = 1e-04
## beta = 1e-04
## gamma = 1e-04
## phi = 0.9762
##
## Initial states:
## l = 215.1624
## b = 1.9084
## s = 1.0242 0.9686 0.9256 1.0231 0.8399 1.134
## 1.0847
##
## sigma: 0.1381
##
## AIC AICc BIC
## 960.2097 965.5627 991.4971
ny1_ets_3
## ETS(M,A,N)
##
## Call:
## ets(y = ny1_train, model = "MAN")
##
## Smoothing parameters:
## alpha = 1e-04
## beta = 1e-04
##
## Initial states:
## l = 221.016
## b = 0.8982
##
## sigma: 0.1624
##
## AIC AICc BIC
## 979.4819 980.2713 991.5155
These three models are tested for their out-of-sample performance.
ny1_ets_out_sample<-as.data.frame(accuracy(forecast(ny1_ets_1, h = 20), ny1_test))
ny1_ets_out_sample<-rbind(ny1_ets_out_sample,accuracy(forecast(ny1_ets_2, h = 20), ny1_test))
ny1_ets_out_sample<-rbind(ny1_ets_out_sample,accuracy(forecast(ny1_ets_3, h = 20), ny1_test))
ny1_ets_out_sample$Model<- c("Model 1","Model 1", "Model 2", "Model 2", "Model 3", "Model 3")
ny1_ets_out_sample<- ny1_ets_out_sample %>% select(Model, everything())
ny1_ets_out_sample %>% kable(caption = "Out of Sample Performance") %>% kable_styling("striped")
| Model | ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | Theil’s U | |
|---|---|---|---|---|---|---|---|---|---|
| Training set | Model 1 | 2.0251110 | 35.05182 | 26.63933 | -0.8240822 | 10.287605 | 0.7491376 | -0.0429736 | NA |
| Test set | Model 1 | 8.6631279 | 44.59696 | 36.87301 | 0.7187325 | 12.946867 | 1.0369237 | 0.2157562 | 0.7266446 |
| Training set1 | Model 2 | -1.3569147 | 33.63185 | 25.40108 | -2.1492265 | 9.881108 | 0.7143162 | 0.0231868 | NA |
| Test set1 | Model 2 | -10.5509412 | 45.39093 | 39.71499 | -6.4144413 | 14.996952 | 1.1168445 | 0.2036839 | 0.7652000 |
| Training set2 | Model 3 | -0.1980519 | 41.82318 | 33.31233 | -2.5115689 | 13.036344 | 0.9367922 | 0.0720625 | NA |
| Test set2 | Model 3 | -29.4331996 | 60.69639 | 56.00368 | -14.4006278 | 21.732654 | 1.5749067 | 0.3355449 | 1.0179866 |
#Plot Forecast
autoplot(forecast(ny1_ets_1, h = 20))
Model 1 is therefore selected since it gives the best fit on RMSE for the test set.
checkresiduals(ny1_ets_1)
##
## Ljung-Box test
##
## data: Residuals from ETS(M,N,M)
## Q* = 10.696, df = 5, p-value = 0.05775
##
## Model df: 9. Total lags used: 14
Model 1 also passes on all residual diagnostics, providing more confidence in this model.
As in previous cases, the timeseries is decomposed in order to inspect the seasonal, trend and error components.
ny2_stl<-ny2_train[,1]
stl(ny2_stl,s.window ="period") %>% autoplot
From the plots above, it is seen that the components are likely to be additive. However the relatively large grey bar for the trend component indicates that it does not contirbute greatly to the variations in the timeseries. Consequently, potential models to test include “A,A,A”, “A,N,A”, “A,A,N” and “A,N,N”, as the grey bar for the seasonal component also appears to be quite large.
ny2_ets_1<-ets(ny2_train, model="ZZZ")
ny2_ets_1
## ETS(A,N,A)
##
## Call:
## ets(y = ny2_train, model = "ZZZ")
##
## Smoothing parameters:
## alpha = 0.1895
## gamma = 1e-04
##
## Initial states:
## l = 215.2573
## s = 13.4722 30.6874 15.4489 18.7483 6.1767 -69.3617
## -15.1719
##
## sigma: 44.4909
##
## AIC AICc BIC
## 839.1011 842.8299 861.5861
The optimisation provides A,N,A as the best model. Therefore, the other three are considered as candidate models.
#Candidate Models
ny2_ets_2<-ets(ny2_train, model="AAA")
ny2_ets_3<-ets(ny2_train, model="AAN")
ny2_ets_4<-ets(ny2_train, model="ANN")
ny2_ets_out_sample<-as.data.frame(accuracy(forecast(ny2_ets_1, h = 18), ny2_test))
ny2_ets_out_sample<-rbind(ny2_ets_out_sample,accuracy(forecast(ny2_ets_2, h = 18), ny2_test))
ny2_ets_out_sample<-rbind(ny2_ets_out_sample,accuracy(forecast(ny2_ets_3, h = 18), ny2_test))
ny2_ets_out_sample<-rbind(ny2_ets_out_sample,accuracy(forecast(ny2_ets_4, h = 18), ny2_test))
ny2_ets_out_sample$Model<- c("Model 1","Model 1", "Model 2", "Model 2", "Model 3", "Model 3", "Model 4", "Model 4")
ny2_ets_out_sample<- ny2_ets_out_sample %>% select(Model, everything())
ny2_ets_out_sample %>% kable(caption = "Out of Sample Performance") %>% kable_styling("striped")
| Model | ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | Theil’s U | |
|---|---|---|---|---|---|---|---|---|---|
| Training set | Model 1 | -0.5314202 | 41.53243 | 34.24962 | -13.4603626 | 26.57567 | 0.7226142 | 0.0831474 | NA |
| Test set | Model 1 | 7.9390062 | 47.20564 | 29.52803 | 0.6841513 | 12.93265 | 0.6229960 | -0.1930890 | 0.7559118 |
| Training set1 | Model 2 | -0.2728030 | 41.19026 | 33.94867 | -13.5247902 | 26.65282 | 0.7162647 | 0.0719375 | NA |
| Test set1 | Model 2 | 1.7881146 | 46.25062 | 30.78688 | -2.4625522 | 13.88885 | 0.6495557 | -0.2030599 | 0.7424033 |
| Training set2 | Model 3 | -9.8778535 | 52.40140 | 38.23240 | -27.7974478 | 38.46749 | 0.8066447 | 0.2219707 | NA |
| Test set2 | Model 3 | 63.3531198 | 87.00047 | 67.75642 | 25.3560064 | 28.79608 | 1.4295560 | 0.2200590 | 1.3949020 |
| Training set3 | Model 4 | 2.9188311 | 52.59888 | 40.41706 | -19.5549004 | 36.00171 | 0.8527376 | 0.1735691 | NA |
| Test set3 | Model 4 | -1.3356800 | 54.34285 | 37.87089 | -6.2794817 | 18.16734 | 0.7990175 | 0.0952819 | 0.8511216 |
Given that Model 2 performs the best on the on test RMSE and is therefore selected.
#Plot Forecast
autoplot(forecast(ny2_ets_2, h = 18))
checkresiduals(ny2_ets_2)
##
## Ljung-Box test
##
## data: Residuals from ETS(A,A,A)
## Q* = 12.145, df = 3, p-value = 0.006904
##
## Model df: 11. Total lags used: 14
The residuals generally exhibit a white noise pattern, with lags that are insignificant. However, the distribution appears to be slightly skewed and the Ljung-Box test returns a significant p-value. Consequently, these tests are inconclusive.
Summary of best performing models:
Firstly, a summary of the best of the ARIMA and Holt-Winters models for each store is provided below:
Cali1 Store (46673): The best of the ARIMA Models is ARIMA(1,0,0)(0,1,1)[7]. The best of the Holt-Winters models is ETS(A,N,A).
Cali2 Store (4904): The best of the ARIMA Models is ARIMA(0,1,1)(0,1,1)[7]. The best of the Holt-Winters models is ETS(A,N,A).
NY1 Store (12631): The best of the ARIMA Models is ARIMA (0,1,1)(2,0,0)[7]. The best of the Holt-Winters models is ETS(M,N,M).
NY2 Store (20974): The best of the ARIMA Models is ARIMA (0,0,1)(1,0,0)[7] with non-zero mean. The best of the Holt-Winters models is ETS(A,A,A).
Comparison between Holt-Winters and ARIMA Models : While the various models were fit on a range of Information Criteria (AIC, AICc, BIC), they only include training data and provide no evidence on how well the models are able to generalise. Furthermore, as the likelihood is calculated differently for ARIMA and ETS models, they cannot be compared. Consequently, the in and out of sample perfomance metrics calculated for the models in earlier sections is used for the final selection - specifically, RMSE. Ideally, the best model will provide the lowest RMSE but also a consistent performance between the training and test sample - indicating that it is able to generalise well on unseen data (a compromise between underfitting and overfitting).
cali1_overall_fit<-cali1_ets_out_sample #Create a copy to preserve original dataframe
cali1_overall_fit$Model<-c("Holt-Winters","Holt-Winters")
cali1_overall_fit<-cali1_overall_fit %>% select(Model, everything()) #Reorder
#Merge with ARIMA out of sample
cali1_overall_fit<-rbind(cali1_overall_fit,cali1_out_sample[cali1_out_sample$Model=="Model 1",])
cali1_overall_fit[3:4,1]<-c("ARIMA","ARIMA")
cali1_overall_fit %>% kable(caption = "Performance Comparison: Holt-Winters VS ARIMA") %>% kable_styling("striped")
| Model | ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | Theil’s U | |
|---|---|---|---|---|---|---|---|---|---|
| Training set | Holt-Winters | -1.8616608 | 23.22218 | 18.45667 | -4.154372 | 14.30555 | 0.6970043 | 0.0562287 | NA |
| Test set | Holt-Winters | 19.4164807 | 33.28287 | 25.92539 | 12.463420 | 18.22348 | 0.9790556 | 0.2157657 | 0.4540012 |
| Training set1 | ARIMA | -0.7949027 | 24.95910 | 18.28022 | -2.980306 | 13.47389 | 0.6903407 | 0.0052265 | NA |
| Test set1 | ARIMA | 7.6579755 | 32.69719 | 23.53935 | 1.828789 | 17.15170 | 0.8889481 | 0.1942585 | 0.4717754 |
On RMSE, the perfomance between the ARIMA model and Holt-Winters are comparable, including the perfomance of the spread of errors on the training and test set. However, given the “approach with caution” warning for the Holt-Winters model in section 4.13 and the slightly lower RMSE for the ARIMA, ARIMA(1,0,0)(0,1,1)[7] is selected as the final model.
Below is a comparison of the ETS(A,N,A) and ARIMA(0,1,1)(0,1,1)[7] Models.
cali2_overall_fit<-cali2_ets_out_sample[cali2_ets_out_sample$Model=="Model 2",] #Extract Relevant Rows
cali2_overall_fit$Model<-c("Holt-Winters","Holt-Winters") #Rename Rows
#Combine with ARIMA
cali2_overall_fit<-rbind(cali2_overall_fit,cali2_out_sample[cali2_out_sample$Model=="Model 1",])
cali2_overall_fit[3:4,1]<-c("ARIMA","ARIMA")
cali2_overall_fit %>% kable(caption = "Performance Comparison: Holt-Winters VS ARIMA") %>% kable_styling("striped")
| Model | ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | Theil’s U | |
|---|---|---|---|---|---|---|---|---|---|
| Training set1 | Holt-Winters | -3.960539 | 41.26204 | 31.63586 | -3.438702 | 10.98139 | 0.7382057 | -0.0893617 | NA |
| Test set1 | Holt-Winters | 34.049448 | 55.95950 | 44.43286 | 10.551570 | 14.25775 | 1.0368169 | -0.0241791 | 0.6328567 |
| Training set | ARIMA | -7.638800 | 44.18003 | 31.90097 | -3.637262 | 10.72865 | 0.7443920 | -0.1074127 | NA |
| Test set | ARIMA | 71.572809 | 94.50028 | 74.24301 | 21.922535 | 23.10160 | 1.7324206 | -0.3237485 | 1.0594637 |
In this case, the Holt-Winters model given by ETS(A,N,A) gives a clearly superior performance on the in sample and out-of sample sets. Therefore, ETS(A,N,A) is selected as the final model.
Below is a comparison of the ETS(M,N,M) and ARIMA (0,1,1)(2,0,0)[7] Models.
ny1_overall_fit<-ny1_ets_out_sample[ny1_ets_out_sample$Model=="Model 1",] #Extract Relevant Rows
ny1_overall_fit$Model<-c("Holt-Winters","Holt-Winters") #Rename Rows
#Combine with ARIMA
ny1_overall_fit<-rbind(ny1_overall_fit,ny1_out_sample[ny1_out_sample$Model=="Model 1",])
ny1_overall_fit[3:4,1]<-c("ARIMA","ARIMA")
ny1_overall_fit %>% kable(caption = "Performance Comparison: Holt-Winters VS ARIMA") %>% kable_styling("striped")
| Model | ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | Theil’s U | |
|---|---|---|---|---|---|---|---|---|---|
| Training set | Holt-Winters | 2.025111 | 35.05182 | 26.63933 | -0.8240822 | 10.28761 | 0.7491376 | -0.0429736 | NA |
| Test set | Holt-Winters | 8.663128 | 44.59696 | 36.87301 | 0.7187325 | 12.94687 | 1.0369237 | 0.2157562 | 0.7266446 |
| Training set1 | ARIMA | 2.514712 | 39.04974 | 29.64812 | -1.0358261 | 11.50951 | 0.8337491 | -0.0233680 | NA |
| Test set1 | ARIMA | 11.993375 | 48.73665 | 39.11604 | 1.5721260 | 13.42434 | 1.1000011 | 0.3046582 | 0.7664256 |
While the performance is comprable for the ARIMA and Holt-Winters models, the Holt-Winters model shows superior performance on the test set RMSE. Therefore, ETS(M,N,M) is selected as the final model.
Below is a comparison of the ETS(A,A,A) and ARIMA (0,0,1)(1,0,0)[7] Models.
ny2_overall_fit<-ny2_ets_out_sample[ny2_ets_out_sample$Model=="Model 2",] #Extract Relevant Rows
ny2_overall_fit$Model<-c("Holt-Winters","Holt-Winters") #Rename Rows
#Combine with ARIMA
ny2_overall_fit<-rbind(ny2_overall_fit,ny2_out_sample[ny2_out_sample$Model=="Model 2",])
ny2_overall_fit[3:4,1]<-c("ARIMA","ARIMA")
ny2_overall_fit %>% kable(caption = "Performance Comparison: Holt-Winters VS ARIMA") %>% kable_styling("striped")
| Model | ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | Theil’s U | |
|---|---|---|---|---|---|---|---|---|---|
| Training set1 | Holt-Winters | -0.272803 | 41.19026 | 33.94867 | -13.524790 | 26.65282 | 0.7162647 | 0.0719375 | NA |
| Test set1 | Holt-Winters | 1.788115 | 46.25062 | 30.78688 | -2.462552 | 13.88885 | 0.6495557 | -0.2030599 | 0.7424033 |
| Training set11 | ARIMA | 1.312231 | 48.61274 | 38.78247 | -18.941275 | 33.96957 | 0.8182503 | 0.0247216 | NA |
| Test set11 | ARIMA | -1.826844 | 49.12370 | 32.22283 | -5.550662 | 15.02731 | 0.6798520 | -0.0368099 | 0.7350267 |
While the Holt-Winter model provides the lower RMSE on the test sample, the ARIMA model exhibits a narrower spread between the training and test sets. Since there is only a small difference in the RMSE on the test sets and the ARIMA model provides evidence of being able to generalise better, it is preferred. Therefore, ARIMA (0,0,1)(1,0,0)[7] is selected as the final model.
The final models are then trained on the full datasets in order to make a forecast for the 06/16/2015 to 06/29/2015 window. The results will be presented in a separate CSV file.
#Make the original, full datasets ts objects
cali1_final<-ts(cali1[,2], frequency=7, start=c(03,05))
cali2_final<-ts(cali2[,2], frequency=7, start=c(03,13))
ny1_final<-ts(ny1[,2], frequency = 7, start=c(03,05))
ny2_final<-ts(ny2[,2], frequency = 7, start=c(03,20))
#Train final models on the full data set
#Cali1 Final Model: ARIMA(1,0,0)(0,1,1)[7]
cali1.mfinal<-Arima(cali1_final, order = c(1,0,0), seasonal = list(order = c(0, 1, 1), period=7))
#Cali2 Final Model: ETS(A,N,A)
cali2.mfinal<-ets(cali2_final, model="ANA")
#NY1 Final Model: ETS(M,N,M)
ny1.mfinal<-ets(ny1_final, model="MNM")
#NY2 Final Model: ARIMA(0,0,1)(1,0,0)[7]
ny2.mfinal<-Arima(ny2_final, order = c(0,0,1), seasonal = list(order = c(1, 0, 0), period=7))
#Forecast for 14-day window
cali1_forecast<-forecast(cali1.mfinal,h=14)
cali2_forecast<-forecast(cali2.mfinal,h=14)
ny1_forecast<-forecast(ny1.mfinal,h=14)
ny2_forecast<-forecast(ny2.mfinal,h=14)
#Report Point Forecasts in Table
#Include Dates
Dates<-as.character(seq(ymd('2015-06-16'),ymd('2015-06-29'), by = 'days'))
#Extract Mean/Point Forecast
forecast_cali1<-cali1_forecast$mean
forecast_cali2<-cali2_forecast$mean
forecast_ny1<-ny1_forecast$mean
forecast_ny2<-ny2_forecast$mean
final_prediction<-cbind(Dates,forecast_cali1,forecast_cali2,forecast_ny1,forecast_ny2)
#Rename Column Names
colnames(final_prediction)<-c("Dates","Store 46673", "Store 4904", "Store 12631", "Store 20974")
#Write CSV
final_prediction<-as.data.frame(final_prediction)
final_prediction %>% kable(caption = "Final Prediction") %>% kable_styling("striped")
| Dates | Store 46673 | Store 4904 | Store 12631 | Store 20974 |
|---|---|---|---|---|
| 2015-06-16 | 172.065673289277 | 359.29446057546 | 258.196528985547 | 223.711585280169 |
| 2015-06-17 | 175.158697394708 | 347.835410264577 | 276.195535773721 | 233.026989668809 |
| 2015-06-18 | 164.803362692132 | 308.86599554549 | 302.647535094403 | 282.886883570584 |
| 2015-06-19 | 100.789620802174 | 212.264763379943 | 307.136078143092 | 200.042752164558 |
| 2015-06-20 | 76.6759675777962 | 212.951138432589 | 233.225435635271 | 213.850107398896 |
| 2015-06-21 | 168.66757546734 | 336.596678298125 | 266.344536763498 | 198.892139228364 |
| 2015-06-22 | 176.529685366448 | 336.878467050087 | 246.864054242653 | 236.862366122792 |
| 2015-06-23 | 160.781118446457 | 359.29446057546 | 258.196587683542 | 220.643167113522 |
| 2015-06-24 | 173.271331392456 | 347.835410264577 | 276.195598563582 | 224.215975378674 |
| 2015-06-25 | 164.487696679953 | 308.86599554549 | 302.64760389782 | 243.339121685568 |
| 2015-06-26 | 100.73682498455 | 212.264763379943 | 307.136147966927 | 211.565278591037 |
| 2015-06-27 | 76.6671373640542 | 212.951138432589 | 233.225488656377 | 216.860919106792 |
| 2015-06-28 | 168.666098595151 | 336.596678298125 | 266.344597313848 | 211.123975214724 |
| 2015-06-29 | 176.529438356407 | 336.878467050087 | 246.864110364341 | 225.686986633051 |
write.csv(final_prediction, "FinalPrediction.csv")